Pattern identification method, apparatus, and program

ABSTRACT

Pattern recognition capable of robust identification for the variance of an input pattern is performed with a low processing cost while the possibility of identification errors is decreased. In a pattern recognition apparatus which identifies the pattern of input data from a data input unit ( 11 ) by using a hierarchical feature extraction processor ( 12 ) which hierarchically extracts features, an extraction result distribution analyzer ( 13 ) analyzes a distribution of at least one feature extraction result obtained by a primary feature extraction processor ( 121 ). On the basis of the analytical result, a secondary feature extraction processor ( 122 ) performs predetermined secondary feature extraction.

TECHNICAL FIELD

The present invention relates to a method, apparatus, and program foridentifying the pattern of an input signal by hierarchically extractingfeatures in, e.g., image recognition or voice recognition.

BACKGROUND ART

There is a technique which identifies the pattern of an input signal byhierarchically extracting features. This method extracts a high-orderfeature by using features which form the feature to be extracted andhave orders lower than that of the feature to be extracted. Accordingly,the method has the characteristic that it can perform robustidentification for the variance of an identification pattern. However,to increase the robustness against the variance of a pattern, it isnecessary to increase the number of types of features to be extracted,and this increases the processing cost. If the number of types offeatures to be extracted is not increased, the possibility ofidentification errors increases.

To solve the above problems, the following pattern recognition method isproposed. First, feature vectors of patterns of individual classes arearranged in descending order of vector component dispersion to formdictionary patterns, and feature vectors are generated from an inputpattern. Then, matching with dictionary patterns of high orders up tothe Nth-order is performed. On the basis of the matching result,matching with lower orders is performed. In this manner, the processingcost can be reduced.

The following pattern recognition dictionary formation apparatus andpattern recognition apparatus are also proposed. First, feature vectorsare extracted from an input pattern, and classified into clusters inaccordance with the degree of matching with the standard vector of eachcluster. Category classification is then performed in accordance withthe degree of matching between category standard vectors in theclassified clusters of the input pattern and the feature vectors.Consequently, the cost of the matching process can be reduced.

DISCLOSURE OF INVENTION

It is, however, being desired to perform pattern recognition capable ofperforming robust identification for the variance of an input pattern,and reducing the processing cost while decreasing the possibility ofidentification errors.

To solve the above problems, according to the present invention, apattern identification method of identifying a pattern of input data byhierarchically extracting features of the input data comprises a firstfeature extraction step of extracting a feature of a first layer, ananalysis step of analyzing a distribution of a feature extraction resultin the first feature extraction step, and a second feature extractionstep of extracting a feature of a second layer higher than the firstlayer on the basis of the distribution analyzed in the analysis step.

According to another aspect of the present invention, a patternidentification apparatus for identifying a pattern of input data byhierarchically extracting features of the input data comprises firstfeature extracting means for extracting a feature of a first layer,analyzing means for analyzing a distribution of a feature extractionresult obtained by the first feature extracting means, and secondfeature extracting means for extracting a feature of a second layerhigher than the first layer on the basis of the distribution analyzed bythe analyzing means.

According to still another aspect of the present invention, a patternidentification program for allowing a computer to identify a pattern ofinput data by hierarchically extracting features of the input datacomprises a first feature extraction step of extracting a feature of afirst layer, an analysis step of analyzing a distribution of a featureextraction result in the first feature extraction step, and a secondfeature extraction step of extracting a feature of a second layer higherthan the first layer on the basis of the distribution analyzed in theanalysis step.

According to still another aspect of the present invention, there isprovided a pattern identification method of identifying a pattern ofinput data by hierarchically extracting features of the input datacomprises a first feature extraction step of extracting a feature of afirst layer, and a second feature extraction step of extracting afeature of a second layer higher than the first layer by one on thebasis of a feature extraction result in the first layer and a featureextraction result in a layer other than the first layer.

According to still another aspect of the present invention, a patternidentification apparatus for identifying a pattern of input data byhierarchically extracting features of the input data comprises firstfeature extraction means for extracting a feature of a first layer, andsecond feature extraction means for extracting a feature of a secondlayer higher than the first layer by one on the basis of a featureextraction result in the first layer and a feature extraction result ina layer other than the first layer.

According to still another aspect of the present invention, a patternidentification program for causing a computer to identify a pattern ofinput data by hierarchically extracting features of the input datacomprises a first feature extraction step of extracting a feature of afirst layer, and a second feature extraction step of extracting afeature of a second layer higher than the first layer by one on thebasis of a feature extraction result in the first layer and a featureextraction result in a layer other than the first layer.

Other features and advantages of the present invention will be apparentfrom the following description taken in conjunction with theaccompanying drawings, in which like reference characters designate thesame or similar parts throughout the figures thereof.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate embodiments of the invention and,together with the description, serve to explain the principles of theinvention.

FIG. 1A is a view showing the basic arrangement of a patternidentification apparatus according to the first embodiment;

FIG. 1B is a view showing the basic arrangement of the patternidentification apparatus according to the first embodiment;

FIG. 2 is a view showing the functional arrangement of the patternidentification apparatus according to the first embodiment;

FIG. 3 is a flowchart showing the flow of processing in the firstembodiment;

FIG. 4 is a view showing face images as an identification category inthe first embodiment;

FIG. 5 is a view showing four types of initial feature extractionresults;

FIG. 6 is a view showing the initial feature extraction results atpositions where local features to be extracted are present;

FIG. 7 is a view showing the arrangement of a basic convolutional neuralnetwork;

FIG. 8 is a view showing the functional arrangement of a patternidentification apparatus according to the second embodiment;

FIGS. 9A and 9B are flowcharts showing the flow of processing in thesecond embodiment;

FIG. 10 is a view showing the functional arrangement of a patternidentification apparatus according to the third embodiment;

FIGS. 11A and 11B are flowcharts showing the flow of processing in thethird embodiment; and

FIG. 12 is a view showing the block configuration of a computer whichimplements the present invention.

FIG. 13 is a view showing the hierarchical structure according to thefourth embodiment;

FIG. 14A is a view for explaining an integrating process according tothe fourth embodiment; and

FIG. 14B is a view for explaining the integrating process according tothe fourth embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

Preferred embodiments of the present invention will now be described indetail in accordance with the accompanying drawings.

(First Embodiment)

As the first embodiment of the present invention, a method ofidentifying whether input two-dimensional image data is a certainspecific category will be explained below.

This embodiment assumes, as identification categories, face images asindicated by i to iv in FIG. 4 in each of which the center of a face ispresent in substantially the center of an input image, and a non-faceimage as indicated by v in FIG. 4 which is not a face image. A method ofidentifying whether input image data is the former or the latter ofthese two categories will be described below.

In this embodiment, identification of whether input image data is a faceimage or not will be explained. However, the application of the presentinvention is not limited to such images. That is, the present inventionis also applicable to other image patterns or to a case in which inputdata is voice data. In addition, to simplify the explanation,identification of whether input image data falls under a singlecategory, i.e., a face, will be described below. However, the presentinvention is applicable not only to identification of a single categorybut also to identification of a plurality of categories.

FIGS. 1A and 1B illustrate the basic arrangements of a patternidentification apparatus. An outline of this pattern identificationapparatus will be described below with reference to FIGS. 1A and 1B.

A data input unit 11 shown in FIG. 1A inputs data as an object ofpattern identification. A hierarchical feature extraction processor 12hierarchically extracts features from the input data, and identifies thepattern of the input data. The hierarchical feature extraction unit 12includes a primary feature extraction processor 121 for performing aprimary feature extraction process, and a secondary feature extractionprocessor 122 for performing a secondary feature extraction process. Anextraction result distribution analyzer 13 analyzes the distribution offeatures extracted by the primary feature extraction processor 121.

In this pattern identification apparatus, the data input unit 11 inputsdata to be identified. The hierarchical feature extraction processor 12performs a hierarchical feature extraction process for this input data.In this hierarchical extraction process, the primary feature extractionprocessor 121 hierarchically extracts a plurality of primary featuresfrom the input data. Then, the extraction result distribution analyzer13 analyzes the distribution of at least one type of a primary featureextracted by the primary feature extraction processor 121. In addition,on the basis of the result of analysis, the second feature extractionprocessor 122 extracts secondary features.

FIG. 1B shows another basic arrangement of the pattern identificationapparatus. An outline of this pattern identification apparatus will beexplained below with reference to FIG. 1B.

Referring to FIG. 1B, a data input unit 11 inputs data as an object ofpattern identification. A hierarchical feature extraction processor 12hierarchically extracts features from the input data, and identifies thepattern of the input data. The hierarchical feature extraction unit 12includes a primary feature extraction processor 121 for performing aprimary feature extraction process, and a secondary feature extractionprocessor 122 for performing a secondary feature extraction process. Anextraction result distribution analyzer 13 analyzes the distribution offeatures extracted by the primary feature extraction processor 121. Acategory likelihood calculator 14 calculates the likelihood of eachcategory of secondary features from the result of analysis by theextraction result distribution analyzer 13.

In this pattern identification apparatus, the data input unit 11 inputsdata to be identified. The hierarchical feature extraction processor 12performs a hierarchical feature extraction process for this input data.In this hierarchical extraction process, the primary feature extractionprocessor 121 hierarchically extracts a plurality of primary featuresfrom the input data. Then, the extraction result distribution analyzer13 analyzes the distribution of at least one type of a primary featureextracted by the primary feature extraction processor 121. On the basisof the result of analysis by the extraction result distribution analyzer13, the category likelihood calculator 14 calculates the likelihood ofeach category of secondary features to be extracted by the secondaryfeature extraction processor 122. The second feature extractionprocessor 122 extracts secondary features which belong to categorieseach having a calculated likelihood equal to or larger than apredetermined value.

FIG. 2 shows the functional arrangement of the pattern identificationapparatus according to this embodiment. FIG. 3 shows the flow ofprocessing in this embodiment. The processing in this embodiment will bedescribed below with reference to FIGS. 2 and 3. Referring to FIG. 2,the solid-line arrows indicate the flows of actual signal data, and thebroken-line arrow indicates the flow of instruction signals, such asoperation instructions, rather than actual signal data. The sameexpression is used in FIGS. 8 and 10 (to be described later).

First, in step S301, an image input unit 21 inputs image data as anobject of identification. Although this input image data is a grayscaleimage in this embodiment, an RGB color image may also be used.

In step S302, an initial feature extractor 22 extracts at least oneinitial feature, such as an edge in a specific direction, of the inputimage. In step S303, a local feature extractor 23 extracts localfeatures, e.g., an edge line segment having a specific length and theend points of this edge line segment, by using the initial featuresextracted by the initial feature extractor 22. In step S304, a partialfeature extractor 24 extracts partial features such as the eye and mouthby using the local features extracted by the local feature extractor 23.

In step S305, a partial feature distribution determinator 25 analyzesthe distribution, in the image, of the partial features extracted by thepartial feature extractor 24. In step S306, in accordance with theanalytical result, the partial feature distribution determinator 25issues an activation instruction to a face extractor 26, and turns onflags of face extraction modules to be activated.

The face extractor 26 is a processor which extracts the face by usingthe partial features extracted by the partial feature extractor 24. Theface extractor 26 is made up of a plurality of modules each of whichextracts the face in accordance with a specific size or direction, andonly modules having received the activation instruction perform faceextraction. In steps S307 to S309, face extraction modules having ONflags sequentially perform the face extraction process, and the flag ofeach face extraction module having executed face extraction is turnedoff. If there is no more face extraction module having an ON flag, theface extraction process is terminated.

In steps S310 and S311, a detection result output unit 27 integrates theface extraction results from the face extraction modules, determineswhether the input image is a face image or a non-face image, and outputsthe determination result.

Details of the processing performed by each processor on and after theinitial feature extractor 22 for the image data input from the imageinput unit 21 will be described below.

Initial features extracted from the input image by the initial featureextractor 22 are desirably constituent elements of features to beextracted by the local feature extractor 23 as a higher layer. In thisembodiment, a filtering process is simply performed in each position ofan input image by using differential filters in a longitudinaldirection, a lateral direction, an oblique direction toward the upperright corner, and an oblique direction toward the upper left corner,thereby extracting four types of features such as a vertical edge,horizontal edge, and oblique edges. Although the filtering process asdescribed above is performed in this embodiment, it is also possible toextract features by performing template matching in each position of aninput image by using a prepared template image indicating initialfeatures.

The extracted feature is held as information such as the type of thefeature, the position in the image, the likelihood of the feature to beextracted, and the feature detection level. In this embodiment, featuresindicated by a to d in FIG. 5 are extracted from the input image (i inFIG. 4) in this stage. Referring to FIG. 5, a, b, c, and d indicate theextraction results of a vertical edge, a horizontal edge, a rightwardoblique edge, and a leftward oblique edge.

In FIG. 5, a position where the result of filtering in each position ofthe image is 0 is gray, positive values are represented by highluminance values, and negative values are represented by low luminancevalues. That is, in the images shown in FIG. 5, in a position having ahigh luminance value, an edge in a direction corresponding to the typeof each filter is extracted. In a position having a low luminance value,an edge in a direction opposite to the direction corresponding to thetype of each filter is present. A gray portion having an intermediateluminance value indicates a position where no edge is extracted.

Since differential filters are used to extract features, the absolutevalues of values obtained by filtering exhibit sharpness of edges. Thatis, in each position of the input image, the larger the change inluminance value in a direction corresponding to the type of filter, thelarger or smaller the luminance value of the position.

Similar to the features extracted by the initial feature extractor 22,the local features extracted by the local feature extractor 23 by usingthe initial feature extraction results obtained by the initial featureextractor 22 are desirably constituent elements of features to beextracted by the partial feature extractor 24 as a higher layer.

In this embodiment, the partial feature extractor 24 extracts the eyeand mouth. Therefore, the local feature extractor 23 extracts featuresas indicated by portions surrounded by circles in 1-a to 4-d of FIG. 6.That is, the local feature extractor 23 extracts two types of features,i.e., the left and right end points as the end points of an edge linesegment corresponding to, e.g., the corners of the eye or the two endsof the mouth. The local feature extractor 23 also extracts two types ofedge line segments having specific lengths, i.e., a featurecorresponding to the upper portion of the eye or the upper portion ofthe lips, and a feature corresponding to the lower portion of the eye orthe lower portion of the lips.

1-a to 1-d in FIG. 6 indicate the initial feature extraction results ina position where the left end point (the inner corner of the left eye inFIG. 6) is present. That is, 1-a, 1-b, 1-c, and 1-d indicate theextraction results of the vertical edge, the horizontal edge, therightward oblique edge, and the leftward oblique edge, respectively.2-a, 2-b, 2-c, and 2-d indicate the extraction results of the initialfeatures (vertical, horizontal, rightward oblique, and leftward obliqueedges, respectively) in a position where the right end point (the endpoint of the mouth in FIG. 6) is present. 3-a, 3-b, 3-c, and 3-dindicate the extraction results of the initial features (vertical,horizontal, rightward oblique, and leftward oblique edges, respectively)in a position where the upper portion of the eye or the upper portion ofthe lips (the upper portion of the right eye in FIG. 6) is present. 4-a,4-b, 4-c, and 4-d indicate the extraction results of the initialfeatures (vertical, horizontal, rightward oblique, and leftward obliqueedges, respectively) in a position where the lower portion of the eye orthe lower portion of the lips (the lower portion of the lips in FIG. 6)is present.

In this embodiment, a method of extracting each feature is as follows.First, a two-dimensional mask unique to each feature extracted by theinitial feature extractor 22 is prepared. Then, in each position of thefeature extraction results as indicated by a to d in FIG. 5, a filteringprocess (convolution arithmetic) is performed using the two-dimensionalmask unique to a feature to be extracted. Each feature is extracted byintegrating the results of filtering performed for the individualinitial feature extraction results.

The prepared unique two-dimensional mask corresponds to the distribution(1-a to 1-d in FIG. 6) of the initial feature extraction results in aposition where the feature to be extracted (e.g., the feature such asthe left end point in FIG. 6) is present. That is, the two-dimensionalmask is so set that the value obtained by filtering is large if theinitial feature extraction result distribution is unique around theposition where the feature to be extracted is present.

The two-dimensional mask is set as follows. First, a plurality of testpatterns are simply given, and the value of each element of thetwo-dimensional mask is so adjusted that the result of filtering has alarge value if the given test pattern is a feature to be extracted.Also, the value of each element of the two-dimensional mask is soadjusted that the result of filtering has a small value if the giventest pattern is not a feature to be extracted. It is also possible toset the value of each element of the two-dimensional mask by usingknowledge obtained in advance.

As in the initial feature extractor 22, each feature extracted by theprocessing as described above is held as information such as the type ofthe extracted feature, the position in the image, the likelihood of thefeature to be extracted, and the feature detection level. In thisembodiment, for each of the four types of features, i.e., the two typesof end points and the edge line segments having the two types ofspecific lengths, filtering is performed for each initial feature byusing the position where the feature is extracted and thetwo-dimensional mask unique to the feature. The results of filtering areintegrated and recorded as the likelihood of the feature.

The processing performed by the partial feature extractor 24 isanalogous to that performed by the local feature extractor 23; partialfeatures are extracted from a plurality of local feature extractionresults obtained by the local feature extractor 23 as the featureextraction results of a lower layer. The partial features to beextracted are also desirably features to be extracted by the faceextractor 26 as a higher layer, i.e., constituent elements of the facein this embodiment.

In this embodiment as described above, the partial feature extractor 24extracts, e.g., the eye and mouth. The process of extraction is the sameas the extraction method of the local feature extractor 23; featuresneed only be extracted by filtering using specific two-dimensionalmasks. Alternatively, it is also possible to simply extract the eye andmouth in accordance with whether, in the feature extraction resultsobtained by the local feature extractor 23, features having likelihoodsof a predetermined value or more have a specific spatial positionalrelationship.

Each of the eye and mouth extracted as described above is also held asinformation such as the type of the extracted feature, the position inthe image, the likelihood of the feature to be extracted, ad the featureamount. In this embodiment, the results of filtering performed for thelocal feature extraction results by using the two-dimensional masksunique to the eye and mouth are integrated in each position of theimage, and held as the likelihood in the position of each partialfeature.

The partial feature distribution determinator 25 performs simpledistribution analysis on the feature extraction results obtained by thepartial feature extractor 24. In addition, on the basis of theanalytical result, the partial feature distribution determinator 25gives an activation instruction to one or a plurality of predeterminedface extraction modules of the face extractor 26.

Unlike in the processes performed from the initial feature extractor 22to the partial feature extractor 24, the analysis herein mentionedextracts necessary conditions for each predetermined face extractionmodule to which the activation instruction is to be given. For example,in this embodiment, this analysis determines whether the eye isextracted near predetermined coordinates in the input image by theprocessing of the partial feature extractor 24. The analysis alsodetermines whether the barycentric position of the mouth extractionresults obtained by the processing of the partial feature extractor 24is in the vicinity of the predetermined coordinates. Alternatively, theanalysis determines whether the total of the likelihoods of the eye asthe processing results of the partial feature extractor 24 is equal toor larger than a predetermined value.

These analyses as described above can be performed by presettingconditions corresponding to modules which make up the face extractor 26and perform face extraction corresponding to a plurality of variances.The variances herein mentioned are changes in features obtained by,e.g., affine transformation such as rotational transformation and sizetransformation, and transformation corresponding to, e.g., a case inwhich the face is turned to the side. For example, one necessarycondition set for a face extraction module corresponding to a clockwiseplanar rotational variance is that the barycenter of the mouthextraction results is present off to the lower left of the center of theimage, and the barycenter of the eye extraction results is off to theupper right of the barycenter of the mouth extraction results.

Several analyses as described above are performed, and an activationinstruction is issued to predetermined face extraction modules meetingthe conditions of analysis. The barycenters and the total of likelihoodsmay also be analyzed within a predetermined range, e.g., a positionwhere the eye is expected to exist. It is also possible to compare thetotals of likelihoods of two or more features. Since modules for featureextraction are thus selected by the analyses having the simple necessaryconditions as described above, the processing cost can be reduced, andidentification errors can also be reduced.

In the face extractor 26, only predetermined face extraction moduleshaving received the activation instruction from the partial featuredistribution extractor 25 perform a feature extraction process similarto that of the partial feature extractor 24 by using the extractionresults of the eye and mouth obtained by the partial feature extractor24. Examples of prepared modules corresponding to specific variances area module specialized to a variance in size (ii in FIG. 4), a modulespecialized to a variance caused by planar rotation (iii in FIG. 4), amodule specialized to a variance caused by a horizontal shake of theface (iv in FIG. 4), and a module specialized to a variance caused by avertical shake of the face.

In this embodiment, a specific two-dimensional mask is prepared for eachmodule corresponding to the variance as described above, and only amodule having received the activation instruction performs filtering byusing the specific two-dimensional mask. The two-dimensional mask is setin the same manner as explained for the local feature extractor 23; thetwo-dimensional mask is set by giving, as a test pattern, a face havinga specific variance corresponding to a module so that the module isspecialized to the corresponding variance.

This face extraction is performed by using the face around the center ofthe image as a target. Therefore, unlike the feature extractionprocesses up to the partial feature extractor 24, filtering need not beperformed in each position of the image but need only be performedwithin the face extraction range of the image.

The detection result output unit 27 performs final input image categoryclassification from the results of filtering performed by those modulescorresponding to the variances, which have received the activationinstruction and performed the face extraction process. In thisembodiment, the detection result output unit 27 simply determineswhether the output value of each activated face extraction module hasexceeded a threshold value set for the module. If the output value of atleast one module has exceeded the threshold value, the detection resultoutput unit 27 determines that the input image is a face image; if not,the detection result output unit 27 determines that the input image is anon-face image.

This determination is not limited to the above method. For example,final determination may also be performed by integrating the outputvalues of the activated modules. More specifically, identificationerrors can be reduced by suppressing the outputs of modules havingconflicting variances. For example, it is possible to subtract, from theoutput value of a module corresponding to a clockwise planar rotationalvariance, the output value of a module corresponding to acounterclockwise planar rotational variance, as an opposite variancecategory, after a predetermined weight is added to the latter outputvalue.

Also, the threshold values for identification can be increased bypromoting the outputs of modules corresponding to similar variances. Asa consequence, identification errors can be reduced. For example, it ispossible to add, to the output module corresponding to a face having aspecific size, the output value of a module corresponding to a facehaving a size slightly larger than the specific size, which is a similarvariance category, after a predetermined weight is added to the latteroutput value.

It is also possible to perform weighted addition or a simple arithmeticmean operation for the output values of two or more modulescorresponding to similar categories as described above, and newly setthe obtained value as an output value of a virtual feature extractionmodule corresponding to an intermediate variance between the categories.Consequently, high-accuracy identification can be performed with a lowprocessing cost without any identification errors.

The above first embodiment is explained as an example of a method ofidentifying whether input two-dimensional image data is a certainspecific category, wherein a face image in which the center of a face ispresent in substantially the center of the input image and a non-faceimage which is an image other than the face image are assumed asidentification categories, and whether the input image data is one ofthese two categories is identified.

(Second Embodiment)

In the second embodiment, a method of detecting the position of a facein input two-dimensional image data will be described as a modificationof the above first embodiment. In this embodiment, a process ofdetecting the face in an image will be explained below. However, as inthe first embodiment, the application of the present invention is notlimited to the process of detecting the face in an image. That is, thepresent invention is also applicable to a process of detecting anotherimage pattern or a predetermined pattern from input voice data. Inaddition, the present invention can be applied to detection of objectsof a plurality of categories.

In this embodiment, as a method of detecting, with robustness againstvariances, a specific pattern from two-dimensional image data byhierarchical feature extraction, the basic configuration of aconvolutional neural network (to be referred to as CNN hereinafter) ischanged. FIG. 7 shows the basic CNN arrangement. The basic processing ofthe CNN will be explained below with reference to FIG. 7. In FIG. 7, theprocessing flows to the right from the left end as an input end.

In FIG. 7, reference numeral 71 denotes a pixel value distributioncorresponding to, e.g., the luminance value of an input image. Referencenumerals 72, 74, 76, and 78 denote feature detecting layers. Referencenumerals L7-21, L7-22, L7-23, L7-24, L7-41, L7-42, L7-43, L7-44, L7-61,L7-62, and L7-81 in these layers denote feature detecting cell planes.Reference numerals 73, 75, and 77 denote feature integrating layers.Reference numerals L7-31, L7-32, L7-33, L7-34, L7-51, L7-52, L7-53,L7-54, L7-71, and L7-72 in these layers denote feature integrating cellplanes.

In the CNN, two layers, i.e., a feature detecting layer and featureintegrating layer are combined as one set, and these layers arehierarchically arranged. Each feature detecting cell plane in thefeature detecting layer has a feature detecting neuron which detects acertain specific feature. Each feature detecting neuron is connected tothe feature detection result of a layer in the preceding stage by aweight distribution unique to each feature detecting cell plane, withina local range corresponding to the position of the feature detectingneuron. For example, a feature detecting neuron in the feature detectinglayer 74 is connected to the feature detection results from L7-31 toL7-34, and a feature detecting neuron in the feature detecting layer 72is connected to the input image 71, by a weight distribution unique toeach feature detecting cell plane (e.g., L7-21).

This weight corresponds to a differential filter for extracting an edgeor a two-dimensional mask for extracting a specific feature described inthe first embodiment. As described in the first embodiment, this weightcan be set by using knowledge obtained in advance, or by learning whichgives a plurality of test patterns. It is also possible to set theweight by using a known neural network learning method, e.g., learningusing the back propagation method, or self-organizing learning usingHebb Learning Law.

Each feature detecting neuron is added, with a predetermined weight, tothe feature detection result of a feature cell plane as the destinationof connection. If the neuron is in the feature detecting layer 72, it isadded, with a predetermined weight, to the luminance value or the likeof an input image. In addition, the value of the operation result istransformed by a nonlinear function such as a hyperbolic tangentfunction, and the obtained value is used as the output value of thefeature detecting neuron, thereby detecting a feature.

For example, if L7-21 is a cell plane for detecting a vertical edge,each feature neuron in L7-21 performs weighted addition corresponding toa differential filter, with respect to the luminance value of an inputimage. In this manner, in a position of the input image where a verticaledge is present, the value of the operation result performed by thefeature detecting neurons in L7-21 increases, and this increases theoutput value. That is, a feature is detected.

This similarly applies to other feature detecting cell planes; in aposition of each feature detecting cell plane where a specific featureis detected, a feature detecting neuron outputs a large value. Althoughthe output value is generally calculated by nonlinear transformation asdescribed above, the calculation method is not limited to thistransformation.

Each feature integrating cell plane (e.g., L7-31) in a featureintegrating layer (e.g., 73) has a feature integrating neuron which isconnected to one feature detecting cell plane (e.g., L7-21) of a featuredetecting layer (e.g., 72) as a layer in the preceding stage, andconnected within a local range to the feature detecting results in thepreceding stage to diffuse (integrate) the feature detecting results.Each feature integrating neuron basically performs the same arithmeticas the feature detecting neuron described above. The characteristic ofthis feature integrating neuron is that a weight distributioncorresponding to a specific two-dimensional mask is a Gaussian filter ora low-pass filter.

The network structure of the CNN gradually detects high-order featuresfrom initial features by using the hierarchical feature detecting andintegrating processes as described above, and finally categorizes theinput. Specific image detection can be performed by detecting high-orderfeatures from an input image by the above processing. The CNN ischaracterized in that identification which is robust against varianceshaving various patterns can be performed by hierarchical featureextraction and by diffusion by the feature integrating layers.

This embodiment will be described below by taking the CNN describedabove as the basic hierarchical feature extraction processconfiguration. FIG. 8 shows the arrangement of processors according tothis embodiment. FIG. 9 shows the flow of processing according to thisembodiment. The processing of this embodiment will be explained belowwith reference to FIGS. 8 and 9.

Referring to FIG. 8, an image input unit 801, initial feature extractor802, local feature extractor 803, and partial feature extractor 804 aresimilar to the image input unit 21, initial feature extractor 22, localfeature extractor 23, and partial feature extractor 24, respectively, ofthe first embodiment. Also, processes in steps S901 to S904 are the sameas in steps S301 to S304 of FIG. 3.

In this embodiment, an RGB color image is used in the image input unit801, and a grayscale image obtained by converting this RGB color imageis input to the initial feature extractor 802 in the next layer. Inaddition, processing performed by the CNN described above is used infeature extraction, and each feature extractor integrates a featuredetected in a feature detecting layer and a feature detected in afeature integrating layer. The types of features extracted by the localfeature extractor 803 and partial feature extractor 804 are analogous tothose of the first embodiment. Also, similar to the method of setting aunique two-dimensional mask explained in the first embodiment, a weightdistribution unique to each feature detecting cell plane for detecting afeature is set by learning by inputting a plurality of test patterns.

In this embodiment, features to be extracted by the initial featureextractor 802 are not limited beforehand. Instead, the back propagationmethod is used when features detected by the local feature extractor 803are learned, thereby learning a weight distribution unique to eachfeature detecting cell plane for detecting a local feature, andautomatically setting a weight distribution unique to each feature cellplane for detecting an initial feature. In this manner, a weightdistribution coupled with the input image 71 can be automatically set sothat the initial feature extractor 802 extracts initial features whichmake up a local feature detected by the local feature extractor 803, andare necessary to detect the local feature.

In step S905, a first face extractor 805 performs the same processing asthe above-mentioned feature extraction method for the eye and mouthextraction results obtained by the partial feature extractor 804,thereby extracting the face in the image.

If the output value from the first face extractor 805 exceeds apredetermined threshold value, a face candidate existence determinator806 determines that a candidate for the face exists (step S906). Then,the face candidate existence determinator 806 sets the number of facecandidates in Count (step S907), sequentially outputs the coordinates ofthe face candidate existing positions found to have the face candidates,and issues an activation instruction to a skin color region extractor807 and partial feature distribution determinator 808 (step S908).

When receiving the activation instruction from the face candidateexistence determinator 806, the skin color region extractor 807 extractsa skin color region from the input image within a range based on theface candidate existence position coordinates (step S909). The partialfeature distribution determinator 808 determines the distribution of thepartial feature extraction results within the range based on the facecandidate existence position coordinates (step S910). In addition, as inthe first embodiment, the partial feature distribution determinator 808turns on the flags of face extraction modules to be activated (stepS911).

The partial feature distribution determinator 808 of this embodimentdiffers from the partial feature distribution determinator 25 of thefirst embodiment in that the partial feature distribution determinator808 uses not only the feature extraction results from the partialfeature extractor 804 but also the skin color region extraction resultsfrom the skin color region extractor 807. The partial featuredistribution determinator 808 performs simple distribution analysis onthese feature extraction results, and includes face extraction modulescorresponding to a plurality of variances. The partial featuredistribution determinator 808 is also a processor which issues anactivation instruction to a second face extractor 809. Note that oneface extraction module in this embodiment corresponds to one featuredetecting cell plane in the CNN.

As in the first embodiment, the second face extractor 809 causes faceextraction modules corresponding to variances to perform faceextraction. That is, the second face extractor 809 sequentially causesface extraction modules having ON flags to perform face extraction atthe face candidate existence position coordinates, and turns off theflags of the face extraction modules having executed face extraction(steps S911 to S914).

Unlike in the first embodiment, the face extraction process in thisembodiment extracts a face corresponding to specific variances by usingnot only the eye and mouth feature extraction results obtained by thepartial feature extractor 804, but also the feature extraction resultscorresponding to the upper portion of the eye or the upper portion ofthe lips obtained by the local feature extractor 803, and the skin colorregion extraction results obtained by the skin color region extractor807.

On the basis of the face extraction results from the second faceextractor 809, a detection result output unit 810 outputs a resultindicating the position of the face in the input image. That is, thedetection result output unit 810 integrates the output results from theindividual modules (step S914), and outputs a detection result in theface candidate existence position (S915). The flow then loops todetection in the next face candidate existence position (steps S917 andS918).

Details of the processes performed by the processors on and after thefirst face extractor 805 in this embodiment will be explained below.

The face extraction process performed by the first face extractor 805 isthe same as the feature extraction processes performed by the localfeature extractor 803 and partial feature extractor 804. This faceextraction is made up of only one module, although the face extractor 26of the first embodiment has a plurality of face extraction modulescorresponding to variances. Also, unlike in the first embodiment, theposition of a face in an image is detected in this embodiment.Therefore, face extraction is performed not only near the center of theimage but also in different positions of the image.

A unique weight distribution of each face detecting neuron, which isused in extraction and connected to the partial feature extractionresult obtained by the partial feature extractor 804 is set on the basisof learning by which faces having various variances (e.g., faces havingvarious variances as indicated by i to iv in FIG. 4) are given as testdata. This learning increases the possibility that a non-face portion isregarded as a face, i.e., decreases the accuracy. However, faces havingvarious variances can be extracted by a single module. This processordetects features by using the learned weight distribution as describedabove, and the feature integrating layer integrates the results.

For the results of the face extraction process performed by the firstface extractor 805, the face candidate existence determinator 806determines a portion where the output is equal to or larger than apredetermined threshold value. The face candidate existence determinator806 determines that a face candidate exists in the determined position,and issues an activation instruction to the skin color region extractor807 and partial feature distribution determinator 808 to performprocessing within the range in which this candidate exists.

Upon receiving the activation instruction from the face candidateexistence determinator 806, the skin color region extractor 807 extractsa skin color region near the range within which the face candidateexists. In this embodiment, in a region in which a skin color region isextracted, an RGB color input image is converted into an HSVcolorimetric system, and only pixels within the range of a specific hue(H) are extracted as a skin color region. A method of extracting a skincolor region is not limited to this method, so another generally knownmethod may also be used. For example, it is also possible to extract askin color region by using saturation (S) or luminance (V). In addition,although a skin color region is extracted in this embodiment, a hairregion or the like may also be extracted.

The partial feature distribution determinator 808 performs the sameprocessing as the partial feature distribution determinator 25 of thefirst embodiment. In this embodiment, the partial feature distributiondeterminator 806 receives the activation instruction from the facecandidate existence determinator 806, similar to the skin color regionextractor 807, and analyzes the distribution of predetermined featureextraction results near the range within which the face candidateexists. In accordance with the result of the analysis, the partialfeature distribution determinator 808 gives an activation instruction tothe second face extractor 809 made up of face extraction modulescorresponding to a plurality of specific variances, so as to selectpredetermined face extraction modules and perform face extraction in theface candidate existence position.

The feature extraction results analyzed by the partial featuredistribution determinator 808 are the eye and mouth extraction resultsobtained by the partial feature extractor 804, and the skin color regionextraction result obtained by the skin color region extractor 807. Thisanalysis is the same as in the first embodiment; each module forming thesecond face extractor 809 and corresponding to a variance extracts anecessary condition to be met if a face exists.

Since this embodiment uses the skin color region extraction resultunlike in the first embodiment, several examples of the analysis forthis result will be explained below. The simplest example is theanalysis of the area of an extracted skin color region. It is alsopossible to analyze the aspect ratio of an extracted skin color region,or analyze the relative positional relationship between the barycentersof skin color regions in the upper half and lower half of a region foundto have a face candidate.

The first example serves as one necessary condition of a face extractionmodule corresponding to a specific size in accordance with the area. Thesecond example is one necessary condition of a module corresponding to ahorizontal shake or vertical shake of the face. The third example can beset as one necessary condition of a module corresponding to planarrotation of the face. It is also possible, by using the partial featureextraction results obtained by the partial feature extractor 804, tocompare the area of a region from which the eye is extracted with thearea of a skin color region, compare the area of a region from which theeye is not extracted with the area of the skin color region, or comparethe area of the region from which the eye is not extracted with the areaof a non-skin-color region.

Even the analysis of the area or the like as described above may also beperformed only in a specific region as described in the firstembodiment. For example, the area of a non-skin-color region can beanalyzed in a region which is presumably a hair position. A moreaccurate activation instruction can be issued by adding this analysis tothe analysis of the eye and mouth extraction results as in the firstembodiment.

The second face extractor 809 is a processor similar to the faceextractor 26 of the first embodiment, and includes a plurality of faceextraction modules corresponding to specific variances. In thisembodiment, unlike in the first embodiment, face extraction is performedin the face candidate existence position by using not only the eye andmouth extraction results obtained by the partial feature extractor 804,but also the skin color extraction result obtained by the skin colorregion extractor 807, the extraction results of faces having variousvariances obtained by the first face extractor 805, and the featureextraction result, among other features extracted by the local featureextractor 803, which corresponds to the upper portion of the eye or theupper portion of the lips.

The accuracy of feature extraction can be increased by thus subsidiarilyusing, e.g., the feature extraction result (in this embodiment, thefirst face extraction result) in the same layer, which is a feature onthe same level, the feature extraction result (in this embodiment, theskin color region extraction result) externally inserted into theframework of hierarchical feature extraction, the feature extractionresult (in this embodiment, the feature extraction result correspondingto the upper portion of the eye or the upper portion of the lips) in alayer before the immediately preceding layer, and the feature extractionresult in a layer in the subsequent stage (to be explained in the thirdembodiment described later). Although this processing increases theprocessing cost, the increase in processing cost can be minimizedbecause only a module having received the activation instruction fromthe partial feature distribution determinator 808 performs the featureextraction process of the second face extractor 809 only in a positionwhere a face candidate exists.

The detection result output unit 810 is a processor similar to thedetection result output unit 27 of the first embodiment. That is, fromthe results of feature extraction performed by those which are activatedby the activation instruction from the partial feature determinator 808,of the face extraction modules forming the second face extractor 809 andcorresponding to a plurality of variances, the detection result outputunit 810 determines a position where the face exists in an image, andoutputs the determination result. As explained in the first embodiment,the detection accuracy can be increased by integrating the outputs froma plurality of modules.

In the second embodiment as described above, an example of detection ofthe face existence position in the method of detecting a certainspecific object in an image of input two-dimensional image data isexplained.

(Third Embodiment)

The third embodiment of the present invention is a modification of thesecond embodiment. As in the second embodiment, this embodiment performsa process of detecting the position of a face in an image. However, thisembodiment is also applicable to another image pattern or voice data. Inaddition, the embodiment can be applied to detection of objects of aplurality of categories.

FIG. 10 shows the arrangement of processors of this embodiment. FIG. 11shows the flow of processing of this embodiment. The basic processconfiguration of this embodiment is the same as explained in the secondembodiment. The processing of this embodiment will be explained belowwith reference to FIG. 10.

Processes (steps S1101 to S1109) performed by components from an imageinput unit 1001 to a skin color region extractor 1007 shown in FIG. 10are exactly the same as steps S901 to S909 in the second embodiment, soan explanation thereof will be omitted.

A partial feature distribution determinator 1008 also performs the sameprocessing as the partial feature distribution determinator 808 in thesecond embodiment. However, the partial feature distributiondeterminator 1008 gives an activation instruction to face extractionmodules corresponding to a plurality of variances in a second faceextractor 1009 so as to perform a face extraction process in a facecandidate existence position, in accordance with the analytical resultof the distribution of feature extraction results, and also gives anactivation instruction to a second partial feature extractor 1011 madeup of partial feature extraction modules corresponding to a plurality ofvariances. That is, the partial feature distribution determinator 1008determines the distribution of partial feature extraction results withina range based on the face candidate existence position coordinates (stepS1110), and turns on the flags of face extraction modules to beactivated (step S1111).

The second partial feature extractor 1011 includes a plurality ofmodules for extracting partial features corresponding to specificvariances. Upon receiving the activation instruction from the partialfeature distribution determinator 1008, a module in the second partialfeature extractor 1011 re-extracts a partial feature only in a specificposition determined by the face candidate existence position. That is, apartial feature extraction module corresponding a face extraction modulehaving an ON flag performs a partial feature extraction process in aposition determined by the face candidate existence position coordinates(steps S1113 and S1114).

The second face extractor 1009 is a processor substantially the same asthe second face extractor 809 of the second embodiment. However, if thesecond partial feature extractor 1011 re-extracts partial featurescorresponding to the activated face extraction modules, the second faceextractor 1009 performs face extraction by using the features extractedby a partial feature extractor 1004. That is, the second face extractor1009 performs face extraction in the face candidate existence positionby using a face extraction module having an ON flag, and turns off theflag of the face extraction module having executed face extraction(steps S1115 and S1116).

A detection result output unit 1010 is exactly the same as the detectionresult output unit 810 of the second embodiment, and steps S1117 toS1120 are also exactly the same as steps S915 to S918 of the secondembodiment, so an explanation thereof will be omitted.

Details of the processes in the partial feature distributiondeterminator 1008, second partial feature extractor 1011, and secondface extractor 1009 of this embodiment will be described below.

As described above, the partial feature distribution determinator 1008is the same as the second embodiment in the process of analyzing thedistribution of partial feature extraction results. In the secondembodiment, an activation instruction is issued to modules which performface extraction corresponding to a plurality of variances. However, thepartial feature distribution determinator 1008 also issues an activationinstruction to the second partial feature extractor 1011 which extractspartial features corresponding to the variances of the face extractionmodules to which the activation instruction is issued. Morespecifically, when issuing an activation instruction to a faceextraction module corresponding to, e.g., a clockwise planar rotationalvariance, the partial feature distribution determinator 1008simultaneously issues an activation instruction to a partial featureextraction module corresponding to the same clockwise planar rotationalvariance.

The second partial feature extractor 1011 includes a plurality ofmodules which extract partial features corresponding to a plurality ofvariances. In the second partial feature extractor 1011, partial featureextraction modules corresponding to modules which have received anactivation instruction from the partial feature distributiondeterminator 1008 and perform face extraction corresponding to aplurality of variances are activated to extract partial features onlywithin a specific range determined by the face candidate existenceposition obtained as the result of the face candidate existencedeterminator 1006. The method of feature extraction is the same asexplained in the second embodiment.

Each partial feature extraction module basically corresponds to each ofthe face extraction modules forming the second face extractor 1009 andcorresponding to a plurality of variances. However, this correspondenceneed not be one-to-one correspondence. For example, a partial featureextraction module corresponding a face extraction module for a full facemay also be omitted. In this case, if an activation instruction isissued to this face extraction module for a full face, the secondpartial feature extractor 1011 need not perform any processing.

Furthermore, one partial feature extraction module may also correspondto a plurality of types of face extraction modules. For example, a faceextraction module corresponding to a 15° clockwise planar rotationalvariance and a face extraction module corresponding to a 30° clockwiseplanar rotational variance can be related to a partial featureextraction module which singly performs extraction including these twovariances.

As described above, a feedback mechanism which controls the operation offeature extraction modules in a lower layer on the basis of the featureextraction result output from a higher layer is introduced. That is, theaccuracy of feature extraction can be further increased by re-extractinglow-order features by partial feature extraction modules correspondingto face extraction modules activated in second face extraction andcorresponding to specific variances. Although this re-extraction offeatures increases the processing cost, the increase in processing costcan be minimized because a module having received an activationinstruction performs processing only in a specific position.

In this embodiment, this processor performs only eye extractioncorresponding to variances, without any mouth extraction. To furtherincrease the feature extraction accuracy, mouth extraction correspondingto variances may also be performed, or features other than thoseextracted by the partial feature extractor 1004 may also be extracted.

Furthermore, in this feature extraction, eye extraction is performed byusing the partial feature extraction results of, e.g., the eye and mouthobtained by the partial feature extractor 1004, and the first faceextraction results obtained by the first face extractor 1005, inaddition to the local feature extraction results obtained by the localfeature extractor 1003. As already described in the second embodiment,the accuracy of the feature extraction process can be increased bysubsidiarily using the feature extraction result in the same layer whichis a feature on the same level, and the feature extraction result in ahigher layer which is a feature on a higher level.

The second face extractor 1009 basically performs the same processing asthe second face extractor 809 of the second embodiment. The differencefrom the second face extractor 809 of the second embodiment is that if,in the second partial feature extractor 1011, partial feature extractioncorresponding to variances is performed in accordance with activatedface extraction modules, face extraction is performed by using theresults of this partial feature extraction corresponding to thevariances performed in the second partial feature extractor 1011, ratherthan the partial feature extraction results obtained by the partialfeature extractor 1004.

In this embodiment, the second partial feature extractor 1011 performsonly eye extraction, so mouth extraction is performed using theextraction results from the partial feature extractor 1004. As explainedabove in relation to the second partial feature extractor 1011, if, forexample, there is no partial feature extraction module corresponding toa face extraction module for a full face, the second partial featureextractor 1011 does not re-extract any features when an activationinstruction is issued to this face extraction module for a full face.

In a case like this, the feature extraction results from the partialfeature extractor 1004 can be directly used. In this embodiment, whenpartial feature extraction corresponding to variances is performed inrelation to activated face extraction modules, the eye extraction resultobtained by the partial feature extractor 1004 is not used. To furtherincrease the accuracy, however, this feature extraction result may alsobe subsidiarily used.

In the third embodiment as a modification of the second embodiment asdescribed above, an example of detection of the position of a face in amethod of detecting a certain specific object in an image of inputtwo-dimensional image data is explained.

(Fourth Embodiment)

In the fourth embodiment of the present invention, the connecting formin a hierarchical structure is changed.

FIG. 13 shows the hierarchical structure of a pattern identificationapparatus according to the fourth embodiment. The outline of the patternidentification method will be described with reference to FIG. 13. Adata input unit 131 inputs data for identifying patterns. The input datais basically processed from the left side to the right side in FIG. 13.Features are gradually extracted from low-order features to high-orderfeatures, and an ultimate high-order feature is extracted.

A feature extraction layer 132 has at least one feature extraction plane133. The feature extraction plane 133 includes a large number of featureextractors and extracts a predetermined feature using the extractionresult of another coupled feature extraction plane. The featureextractors within one feature extraction plane have identical structuresand extract the same type of features. This feature extractor basicallyextracts a local feature. The predetermined features are topologicallyextracted from the input data by a large number of feature extractorswithin one feature extraction plane.

The features extracted in a normal feature extraction plane are used forfeature extraction in a feature extraction layer located immediatelysucceeding the normal feature extraction plane. However, as shown inFIG. 13, features extracted by a reuse feature extraction plane 133 aare used in feature extraction not only for the layer locatedimmediately succeeding the plane 133 a but also for a high-order featureextraction layer.

A non-hierarchical feature plane 133 b inputs a feature except featureshierarchically extracted from the input data. For example, thenon-hierarchical feature plane 133 b inputs, as a feature, informationor the like from a sensor except the input data sensor.

An intra-layer reuse feature extraction plane 133 c extracts a featureused in another feature extraction plane 133 d within the same layer. Inthis embodiment, feature extraction is performed using the featuresextracted previously within the same layer. However, after featureextraction is performed in a higher-order layer, feature extraction maybe performed in a lower-order layer using the extraction result of thehigher-order layer.

With the above processes, features are gradually extracted from theinput data in the order of low-order features to high-order features,and desired feature extraction is finally performed to identify theinput data pattern.

FIGS. 14A and 14B are views showing the outline of a result integratingprocess according to this embodiment. A feature extraction plane 133 isidentical to that shown in FIG. 13. A feature extractor 14 is the onedescribed with reference to FIG. 13. The feature extractors 14 generateoutputs (likelihoods of features corresponding to positions) Output(x)as the feature extraction result.

The outline of the result integrating process will be described withreference to FIG. 14A. Each feature extractor 14 a is an excitation orrepression feature extractor. Each feature extractor 14 b givesexcitation, while each feature extractor 14 c gives repression. Thesefeature extractors 14 extract different features at the same position ofthe input data.

A feature extracted by the excitation or repression feature extractor 14a has a higher similarity to a feature extracted by the excitationfeature extractor 14 b, but has a low similarity to a feature extractedby the repression feature extractor 14 c. A value obtained bymultiplying an output Output(r) from the excitation feature extractor 14b by a predetermined weight a is added to an output Output(q) from theexcitation or repression feature extractor 14 a. A value obtained bymultiplying an output Output(p) from the repression feature extractor 14c by a predetermined weight β is subtracted from the output Output(q).These integrating processes make it possible to reduce identificationerrors at low processing cost.

The outline of the result integrating process will be described withreference to FIG. 14B. A virtual feature extraction plane 15 includes alarge number of virtual feature extractors 16. Feature extractors 14 eland 14 f in FIG. 14B are feature extractors used for integration. Thevirtual feature extractor 16 is an integrated virtual feature extractor.Features extracted by the feature extractors 14 e and 14 f used forintegration are of the same type but have different variance levels(e.g., sizes).

An output Output(q) from the integrated virtual feature extractor 16 isthe average value of outputs Output(r) and Output(p) from the featureextractors 14 e and 14 f used for integration or a sum of the outputsOutput(r) and Output(p) weighted by predetermined weightingcoefficients. This result integrating process makes it possible toachieve strong identification against the variance of the input patternat low processing cost.

Note that the above embodiments can be properly combined and practiced.

According to each embodiment described above, it is possible to performpattern recognition capable of robust identification for the variancesof an input pattern, and reducing the processing cost while decreasingthe possibility of identification errors.

In the embodiments as described above, it is possible to perform patternrecognition capable of robust identification for the variances of aninput pattern, and reducing the processing cost while decreasing thepossibility of identification errors.

<Other Embodiments by, e.g., Software>

The present invention can be applied as part of a system constituted bya plurality of devices (e.g., a host computer, interface device, reader,and printer) or as part of a single apparatus (e.g., a copying machineor facsimile apparatus).

Also, the present invention is not limited to the apparatuses andmethods which implement the above embodiments, and to a method performedby combining the methods explained in the embodiments. That is, thescope of the present invention also includes a case in which the programcode of software for implementing the above embodiments is supplied to acomputer (or a CPU or MPU) of the system or apparatus described above,and this computer of the system or apparatus implements the embodimentsby operating the various devices described above in accordance with theprogram code.

In this case, the program code itself of the software implements thefunctions of the above embodiments, and the program code itself and ameans for supplying this program code to the computer, morespecifically, a storage medium storing the program code, come within thescope of the present invention.

As this storage medium storing the program code, it is possible to use,e.g., a floppy (R) disk, hard disk, optical disk, magnetooptical disk,CD-ROM, magnetic tape, nonvolatile memory card, or ROM.

The program code also falls under the scope of the present invention notonly in a case in which the computer implements the functions of theabove embodiments by controlling the various devices in accordance withthe supplied program code, but also in a case in which the program codeimplements the above embodiments in collaboration with, e.g., an OS(Operating System) or another application software running on thecomputer.

Furthermore, the scope of the present invention also includes a case inwhich the supplied program is stored in a memory of a function expansionboard of the computer or in a memory of a function expansion unitconnected to the computer, and a CPU or the like of the functionexpansion board or function expansion unit implements the aboveembodiments by performing part or the whole of actual processing inaccordance with instructions by the program code.

FIG. 12 is a view showing an example of the block configuration of aninformation processing apparatus which implements the present invention.As shown in FIG. 12, in this information processing apparatus, a CPU1201, ROM 1202, RAM 1203, HD (Hard Disk) 1204, CD 1205, KB (KeyBoard)1206, CRT 1207, camera 1208, and network interface (I/F) 1209 areconnected via a bus 1210 so that they can communicate with each other.

The CPU 1201 controls the operation of the whole information processingapparatus by reading out process programs (software programs) from theHD (Hard Disk) 1204 or the like, and executing the readout programs.

The ROM 1202 stores programs and various data used in the programs.

The RAM 1203 is used as, e.g., a working area for temporarily storingprocess programs and information to be processed, in order to allow theCPU 1201 to perform various processes.

The HD 1204 is a component as an example of a large-capacity storage,and saves, e.g., various data such as model data, and process programsto be transferred to the RAM 1203 and the like when various processesare executed.

The CD (CD driver) 1205 reads out data stored in a CD (CD-R) as anexample of an external storage, and writes data in this CD.

The keyboard 1206 is an operation unit by which a user inputs, e.g.,various instructions to the information processing apparatus.

The CRT 1207 displays various pieces of directive information to a user,and various pieces of information such as character information andimage information.

The camera 1208 senses an image to be identified, and inputs the sensedimage.

The interface 1209 is used to load information from the network, andtransmit information to the network.

As many apparently widely different embodiments of the present inventioncan be made without departing from the spirit and scope thereof, it isto be understood that the invention is not limited to the specificembodiments thereof except as defined in the appended claims.

CLAIM OF PRIORITY

This application claims priority from Japanese Patent Application No.2003-417973 filed on Dec. 16, 2003, which is hereby incorporated byreference herein.

1. A pattern identification method of identifying a pattern of inputdata by hierarchically extracting features of the input data,comprising: using a processor to perform the steps of: a first featureextraction step of extracting a feature of a first layer from the inputdata; an analysis step of analyzing a distribution of a featureextraction result in the first feature extraction step; a calculationstep of calculating a respective likelihood of extracting from the inputdata a feature of one of a plurality of categories for features of asecond layer, each feature of the second layer corresponding to acombination of features of the first layer, on the basis of thedistribution analyzed in the analysis step, wherein the likelihood iscalculated by filtering using a mask unique to the feature to beextracted, and integrating results of the filtering; a selection step ofselecting at least one extraction module, among a plurality ofextraction modules which extract features of respective categories,whose calculated likelihood of the category for the feature of thesecond layer to be extracted from the input data is not less than apredetermined value; a second feature extraction step of causing theselected extraction module to extract a feature of the second layer fromthe input data; and a storing step of storing the extracted feature ofthe second layer in a memory.
 2. The method according to claim 1,wherein in the first or second feature extraction step, a featureobtained by performing a predetermined transformation to a predeterminedfeature is extracted.
 3. The method according to claim 1, furthercomprising a re-extraction step of re-extracting a feature of a lowerlayer on the basis of a feature extraction result of a higher layer inthe second feature extraction step.
 4. The method according to claim 1,wherein in the analysis step, a distribution of each of a plurality offeature extraction results is analyzed, and a relative relationshipbetween analytical results is analyzed.
 5. The method according to claim1, wherein in the analysis step, a distribution within a specific rangeof at least one of a plurality of feature extraction results isanalyzed.
 6. The method according to claim 1, wherein in the analysisstep, whether the feature is extracted or not extracted within apredetermined range in a distribution of at least one of a plurality offeature extraction results is analyzed.
 7. The method according to claim1, wherein in the analysis step, a barycenter of a distribution of atleast one of a plurality of feature extraction results is analyzed. 8.The method according to claim 1, wherein in the analysis step, a size ofa range within which the feature is extracted or not extracted in adistribution of at least one of a plurality of feature extractionresults is analyzed.
 9. The method according to claim 1, wherein in theanalysis step, a likelihood of at least one of a plurality of featureextraction results or a total of feature detection levels is analyzed.10. The method according to claim 1, wherein the pattern identificationis performed on a presence/absence of a face image contained in theinput data.
 11. The method according to claim 1, wherein the patternidentification is performed on a position of a face image contained inthe input data.
 12. The method according to claim 1, wherein, in thesecond feature extraction step, the feature of the second layer isextracted on the basis of a feature extraction result in the first layerand a feature extraction result in a layer other than the first layer.13. The method according to claim 12, wherein the layer other than thefirst layer is a layer lower than the first layer.
 14. The methodaccording to claim 12, wherein the layer other than the first layer isthe second layer.
 15. The method according to claim 12, furthercomprising an integrating step of integrating feature extraction resultsby a plurality of feature extractors in the same layer.
 16. A patternidentification apparatus for identifying a pattern of input data byhierarchically extracting features of the input data, comprising: firstfeature extracting means for extracting a feature of a first layer fromthe input data; analyzing means for analyzing a distribution of afeature extraction result obtained by said first feature extractingmeans; calculating means for calculating a respective likelihood ofextracting from the input data a feature of one of a plurality ofcategories for features of a second layer, each feature of the secondlayer corresponding to a combination of features of the first layer onthe basis of the distribution analyzed by said analyzing means, whereinthe likelihood is calculated by filtering using a mask unique to thefeature to be extracted, and integrating results of the filtering;selection means for selecting at least one extraction module, from amonga plurality of extraction modules which extract features of respectivecategories, whose calculated likelihood of the category for the featureof the second layer to be extracted from input data is not less than apredetermined value; second feature extracting means for causing theselected extraction module to extract a feature of the second layer fromthe input data; and storing means for storing the extracted feature ofthe second layer in a memory.
 17. A non-transitory computer-readablestorage medium on which is stored a pattern identification program forallowing a computer to identify a pattern of input data byhierarchically extracting features of the input data, comprising: afirst feature extraction step of extracting a feature of a first layerfrom the input data; an analysis step of analyzing a distribution of afeature extraction result in the first feature extraction step; acalculation step of calculating a respective likelihood of extractingfrom the input data a feature of one of a plurality of categories forfeatures of a second layer, each feature of the second layercorresponding to a combination of features of the first layer, on thebasis of the distribution analyzed in the analysis step, wherein thelikelihood is calculated by filtering using a mask unique to the featureto be extracted, and integrating results of the filtering; a selectionstep of selecting at least one extraction module, among a plurality ofextraction modules which extract features of respective categories,whose calculated likelihood of the category for the feature of thesecond layer to be extracted from the input data is not less than apredetermined value; a second feature extraction step of causing theselected extraction module to extract a feature of the second layer fromthe input data; and a storing step of storing the extracted feature ofthe second layer in a memory.